[Tests] Diverse Whisper fixes#33665
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
LMK when you want me to review @ylacombe (please fix the tests and the conflicts first 😊) |
|
Hey @LysandreJik, could you review now ? It seems that the failing check is unrelated to Whisper ! |
LysandreJik
left a comment
There was a problem hiding this comment.
I'm not totally up to date with the changes in generation_whisper so I'll trust you with them; if you want a double check feel free to cc @ArthurZucker but if you're confident you can ahead and merge.
|
cc @ArthurZucker for a quick look ! |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, let's just run slow before we merge? 🤗
|
cc @ydshieh , there's still this error when doing slow tests: |
|
Could you rebase and trigger again with |
|
It's working thanks ! |
|
The list of failing tests are as expected, merging ! |
|
Great work! I noticed there's a small issue in this PR, especially when To fix this quickly, you can update to:cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2)[:, :, num_input_ids-3:, :]) |
|
Hey @leng-yue, thanks for your message! |
|
Basically |
* fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper
What does this PR do?
There's a lot of pending failing tests with Whisper. This PR addresses some issues:
decoder_inputs_idswere onceforced_input_ids. This had an impact on thebeam_indices.beam_indiceshas a length ofdecoder_input_ids + potentially_generated_idsbut doesn't take into accountdecoder_input_idswhen keeping track of the indices. In other wordsbeam_indices[0]is really the beam indice of the first generated token, instead ofdecoder_input_ids[0].The Flash-Attention 2 attention mask was causing an issue
The remaining work is done on the modeling tests. Note that some of these tests were failing because of straightforward reasons - e.g the output was a dict - and are actually still failing, but their reasons for failing are not straightforward anymore. Debugging will be easier though.
Note: With #33450 and this, we're down from 29 failing tests to 17